Efficient tight-binding Monte Carlo structural sampling of complex materials 
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While recent work towards the development of tight- 
binding and ab-initio algorithms has focused on molecular 
dynamics, Monte Carlo methods can often lead to better re- 
sults with relatively little effort. We present here a multi-step 
Monte Carlo algorithm that makes use of the possibility of 
quickly evaluating local energies. For the thermalization of a 
1000-atom configuration of a-Si, this algorithm gains about 
an order of magnitude in speed over standard molecular dy- 
namics. The algorithm can easily be ported to a wide range of 
materials and can be dynamically optimized for a maximum 
efficiency. 

Monte Carlo techniques are extensively used in statis- 
tical physics, but have received relatively little attention 
in the materials theory community. Most of the recent 
efforts towards the development of fast tight-binding or 
ab-initio codes, for example, has been done within the 
framework of molecular dynamics (MD) Jl|-|^]. Since all 
atoms move simultaneously within MD, there are limita- 
tions to the approaches that can be used. In particular, 
the most promising order- N MD-based methods are 
only applicable to materials with a well-defined electronic 
gap, reducing significantly the number of problems that 
can be studied with such algorithms. Monte Carlo (MC) 
techniques, on the other hand, allow a much wider range 
of steps to be used and provide an alternative to MD for 
a number of problems. 

In this Letter, we introduce such a Monte Carlo algo- 
rithm. For a 1000-atom tight-binding model, we find a 
gain in efficiency of an order of magnitude over MD with- 
out the need for a perfectly clean electronic gap or any 
other special consideration. We first describe the core of 
the algorithm and then present some of the details asso- 
ciated with computing the intermediate steps. Examples 
are given using a series of amorphous silicon models de- 
scribed by the tight-binding interaction of Kwon et al. [Q . 

The general problem that we address here is the gen- 
eration of a set of configurations, defined by the atomic 
positions X = {x\, . . . , afjv}, sampled according to the 
microcanonical ensemble, i.e., with a probability propor- 
tional to their Boltzmann weight: 



P(X) ~ eM-PE(X)), 



(1) 



where E(X) is the total potential energy of configuration 
X and (3 = (ki,T)~ 1 , the inverse of Boltzmann's constant 
kb times the temperature T. This sampling can be done 
using standard MD or MC. 

In Monte Carlo, a markovian chain of configurations 
is generated through a sequence of trial moves or ele- 
mentary steps. A standard move is obtained by ran- 
domly displacing a configuration Xi to a trial position 
X' i+l = Xi + SX, shifting one or all atoms in the box. 
The trial move is then accepted with a probability given 
by the Metropolis criterion M 



P a = Min[l,exp(-/3 SE)} 



(2) 



where SE = E{X' i+l ) — E(Xi) is the energy difference 
between the trial and the initial positions. 

A straightforward implementation of this standard 
Monte Carlo algorithm requires thus one full energy cal- 
culation per trial step, making it usually much slower 
than a standard MD simulation. It is possible, however, 
to speed up the MC simulation significantly, especially 
as the cost of computing the total energy increases faster 
than linear with the system size. This improvement ex- 
ploits the fact that the energy difference between two 
configurations, SE, which differ only locally, can be es- 
timated in a much quicker way than the total energy, 
as discussed below. Using this approximate value, the 
acceptance ratio, Eq. |^, is modified as follows: 



P a = Min 
x Min 



l,exp SE\ 
l,exp (-06E + P6E 



(3) 



where /3 is an inverse temperature close to (3. Note that 
detailed balance is still strictly obeyed, irrespectively of 
the quality of the estimate and the value of $. This 
expression can be implemented by first using an accept- 
reject procedure based on the first factor, followed by an 
accept-reject procedure based on the second factor only 
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if the first one is accepted. The gain in doing this is that 
the first factor in this equation requires only an estimate 
of the energy difference, which is easily computed, while 
the second factor, that involves an expensive total energy 
calculation, is computed only if the first one is accepted. 
If the estimate is accurate, (3 can be chosen equal to 
(3 and the acceptance probability for the second factor 
is close to unity; effectively a full energy calculation is 
only required per accepted step rather than per trial step. 
Since rejected moves are now cheaper, the trial step size 
can be enlarged for an increased efficiency. 

A further reduction in the number of full energy calcu- 
lations is possible by making a sequence of M interme- 
diate displacement steps, each one accepted or rejected 
based on the estimate of the energy difference, followed 
by a single accept-reject procedure which requires a full 
energy calculation. The resulting acceptance ratio, which 
still strictly preserves detailed balance, becomes 



M 

Pa=\\ [Min 



1, exp 



x Min 



l,exp {^I35E -Y^PSE^j 



(4) 



These two variables, the number and typical size of the 
intermediate trial moves M , can then be optimized for 
speed. For large simulation cells, a good choice is to 
select a trial step size yielding a local acceptance ratio 
around 50%, and to maximize the number of interme- 
diate (cheap) trial moves M under the constraint that 
the global acceptance ratio does not fall below 50%. The 
resulting M depends on the accuracy of the energy esti- 
mates. 

Depending on the system (i.e., metal or insulator, crys- 
talline, disordered solid, or liquid), there are different 
types of Monte Carlo moves for which approximate en- 
ergy differences can be computed efficiently and accu- 
rately. In metals, for example, a MC move localized in 
real space can be combined with an embedded-atom po- 
tential, while a MC move localized in reciprocal space 
is more appropriate in conjunction with plane-waves ab- 
initio calculations. Here, we present an approach de- 
signed for semiconductors and insulators, and apply it 
to models of amorphous silicon described by the tight- 
binding interaction developed by Goodwin, Skinner and 
Pettifor |9) and modified by Kwon et al. 0. 

Within the tight-binding formalism, the total energy 
can be written as 



E = 



[VilHlVi) + E r + EoN a 



(5) 



where the first term represents the electronic contribution 
of the energy given by the sum of the occupied eigenval- 
ues of the tight-binding Hamiltonian H, the second term 
represents the ion-ion repulsion as well as the correction 



from the double counting of the electron-electron inter- 
action and Eq is a constant energy shift per atom, and 
N a is the total number of atoms in the system. Details 
of the potential and parameters used here can be found 
in Ref. §. 

We want to compute the approximate energy differ- 
ences between two systems that are close to each other. 
Because amorphous silicon is a semiconductor, we keep 
the difference between the two configurations localized in 
real space. Consider two configurations A and B with 
total energies Ea and Eb respectively, such that the 
atomic coordinates differ only in a small region around 
a point x rn ; for instance, in a Monte Carlo approach 
with single-atom displacements, it is convenient to select 
x m = (%i + x'i)/2, the geometrical center between the 
"active" region before and after the displacement jl0| . A 
spherical localization volume is constructed around x mi 
large enough to include the displaced atoms or atoms 
and its local environment, following the principle of near- 
sightedness of an equilibrium system [fill which asserts 
that the effects on the deformation on atoms outside the 
localization volume is small. As discussed below, for the 
model system used here, a sphere including about a hun- 
dred atoms around x m is found to be sufficient. 

This discussion may be written formally. Denoting I 
and II as the regions inside and outside the localization 
volume, the exact energy difference can be written as 



8E AB = Eb — Ea 



= {E B I + E b ii )-{E a i 



= {Eb 1 - E/) 



E a 

(E b h -Ea H 



To obtain an estimate of the approximate energy dif- 
ference, we assume that the contribution from region II 
for both configurations is be the same. This is a good 
approximation as long as the radius of the localized vol- 
ume is greater than the localization length of the density 
matrix of the system. The approximate energy difference 
SEa B is therefore simply given by 5Ea B = E B ! — Ea 1 ■ 
Furthermore, if the localization volume is big enough 
and remains stationary during the atomic movement, the 
boundary effects too can be neglected. We construct a 
local Hamiltonian for the localization region I before and 
after the movement with open boundary condition and 
calculate the band energies for the corresponding local 
Hamiltonian, the difference of which would give us the 
approximate change of the band energy in the two config- 
urations. Typically in our calculations we have used vol- 
umes with a radius of 7.5 A, which roughly corresponds 
to a hundred atoms. 

We emphasize that although the quality of the approx- 
imation is strongly related to the degree of localization, 
the algorithm works even if there are states in the gap. 
In this case, the error in the estimate increases but will 
be corrected for appropriately in the final MC step using 
the full energy calculation. In this sense, the algorithm 
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is much more stable under the presence of defects than a 
typical order-N MD. 

We test this scheme on three sample configurations of 
a-Si with 300, 500 and 1000 atoms, respectively. The 
initial cells were prepared using the optimized Wooten- 
Winer-Weaire |T3| bond-switching algorithm described in 
Rcf. Q. This method, which uses a Keating interaction 
potential (llj, produces the best a-Si models to date, 
from a structural and electronic point of view. 

First, we establish the validity of our approximate cal- 
culation of the energy difference between two nearby con- 
figurations. As shown in Fig. [l], where we compare the 
approximate energy, SEab, with the exact energy differ- 
ence obtained by direct diagonalization of the full cell, 
the error implicit in this approximation is about 0.03 eV, 
sufficiently small for the following simulations. The exact 
value of the error depends, of course, on a number of vari- 
ables: the distance between the original and the modified 
configuration, the size of the localization volume, and 
the local strain (which might delocalize some electronic 
states). The error appears, however, independent of the 
size of the simulation cell, which is to be expected be- 
cause of the aforementioned principle of nearsightedness. 
For the results plotted in Fig. [I], we selected a Monte 
Carlo step of 0.4 A, much larger than the value we used 
in our Monte Carlo simulations. 
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FIG. 1. Accuracy of the O(l) estimate of the energy dif- 
ference between two similar configurations as described in the 
text. The exact energy difference SE between two configu- 
rations, calculated using a direct diagonalization method, is 
plotted horizontally. The difference between the exact and es- 
timated energy differences is plotted vertically. The different 
symbols stand for the system sizes indicated in the figure. 

The efficiency of the various methods is determined by 
how fast each samples the phase space. One way to mea- 
sure this, is to estimate the thermalization time from a 
configuration which is out of equilibrium. Starting from 



the three configurations described above, which are opti- 
mized for the Keating potential at zero K, we relax the 
atomic positions at 300 K using two MC algorithms and 
an MD run. In the first MC approach, dubbed the stan- 
dard MC approach, a trial configuration X' i+1 is obtained 
from the old configuration Xi by adding a gaussian ran- 
dom number to each of the 3N degrees of freedom. The 
change in total energy SE = E(X' i+1 ) — E(X,j) is calcu- 
lated, requiring a full total energy computation. Next, 
the trial configuration is either accepted or rejected fol- 
lowing a Metropolis probability as described in Eq. (||). 
The spread of the gaussian distribution is chosen so to 
maximize the diffusion in phase space. For our simula- 
tions with N = 300 atoms, the atomic step size is 0.01 A; 
it decreases with increasing system size to about 0.0025 
A for a 1000-atom simulation. 

The second method approach, called the multi-local 
approach, is the Monte Carlo scheme proposed here: mul- 
tiple local displacements, accepted or rejected based on 
an estimate of the energy difference, followed by a single 
full total energy calculation to recover correct sampling 
of the Boltzmann distribution. Here, the size M of the 
sequence of local moves and the size of the atomic dis- 
placements are chosen so to maximize the diffusion in 
phase space; the values we used are M = 300 and a step 
size of 0.1 A. 

The standard MD simulation is performed at constant 
temperature, by rescaling the velocities at every time 
step. This modifies the trajectory but ensures a faster 
convergence than a more flexible constraint dynamics, 
damping considerably the thermal oscillations. In MD, 
the forces also need to be computed, in addition to the 
total energy. This requires the evaluation of the eigenvec- 
tors of the tight-binding Hamiltonian, a step which is not 
necessary in MC, and which increases the computational 
cost as well as the memory requirements significantly. In 
spite of these additional costs, MD is much faster than 
standard MC, since it is a second-order scheme which 
makes use of inertia. 

We compare the computational efficiency of these 
methods in figure |^, which shows the evolution of the 
total energy as a function of CPU time on a fast worksta- 
tion (DEC Alpha, 21264 architecture, 667 MHz) for the 
methods applied to three different size cells. All curves 
are fitted to the function 

E{t) - E{0) = a (1 - exp(-t/r) cos(ut - 0)) . (6) 

Here, the initial excess energy is given by a, the ther- 
malization time by r, and the oscillatory behavior in 
MD by ui and cf>; in the fits to the MC data, we took 
oj = cf> = since MC inherently does not show oscillatory 
behavior. Important for us are the thermalization times. 
The fitted parameters for the multi-local MC approach 
are t/N 3 = 21, 8.0 and 6.2 fis, respectively, for system 
sizes iV=300, 500 and 1000 atoms. For MD, we find 
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t/N 3 = 32, 35 and 49 fis, respectively. (The increase 
in t/N 3 for MD is probably due to the cache misses.) 
Clearly, the multi-local MC outperforms MD with an in- 
creasing factor as a function of system size; it is close to 
an order of magnitude faster for the 1000-atom cell. 
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FIG. 2. Total energy as a functibn S df CPU time, from top to 
bottom for system sizes N = 300, 500 and 1000. The dashed 
curves correspond to results obtained with the non-local MC 
approach, the solid lines are obtained with molecular dynam- 
ics. For the 300-atom model, the total energy with the stan- 
dard MC approach is plotted as the dash-dotted line. 

The total self-diffusion ^ [fi(t) — ^(0)] 2 correspond- 
ing to these runs is reported in Fig. |^. The MD curves 
show oscillatory behavior due to inertia, while the over- 
damped MC dynamics leads to smooth non-decreasing 
diffusion. The sudden jumps at irregular intervals are 
caused by activated events. The difference in the dis- 
placement at the end of the simulations tells that MC 
and MD have not reached the same local basin. 

The resulting atomic configurations show excellent 
structural and electronic properties; for the largest sam- 
ple, the angular spread is 10.6 degrees (with a nearest- 
neighbor cut-off anywhere between 2.7 A and 2.9 A), and 
the band gap is clean. A detailed study of the structural 
and electronic properties is in progress |15|. 

Besides its higher efficiency, the MC approach is also 
more flexible. It can easily concentrate the computa- 
tional effort on the activate parts of a heterogeneous sys- 
tem, for instance, an interfacial area in a heterostruc- 
ture, or the hydrogen atoms in a-Si:H. Furthermore, the 
efficiency can be raised by allowing well-chosen complex 
moves that involve more than single-atom displacements. 
In the case of a-Si, for example, we could include bond 
transpositions that dominate the low-temperature dy- 
namics [fill. 
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FIG. 3. The total self-diffusion £\ (f-(t) - r*;(0)) 2 as a 

function of CPU time, for system sizes N = 300 (bottom), 
N = 500 (middle) and N = 1000 (top). The solid curves cor- 
respond to MD and the dashed lines to the multi-local MC. 

In summary, we have presented here a multi-local 
Monte Carlo procedure that exploits the fact that good 
estimations of energy differences between relatively close 
configurations can often be obtained cheaply. This per- 
mits us to construct a two-stage algorithm which first 
accepts-rejects a large number of moves based on this ap- 
proximate calculation of the energy and then corrects the 
accumulated errors with a final accept-reject step using 
one full energy computation. This method is highly flexi- 
ble and can easily be adapted to a wide-range of problems 
including those for which order-N molecular dynamics is 
inapplicable. 

We have also presented an implementation of this al- 
gorithm for thcrmalization of tight-binding a-Si models. 
In this case, the multi-local MC outperforms MD in the 
exploration of phase space by up to an order of magni- 
tude. 
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